Skip to content

Ayodeji: Implemented blocked and parallel matrix multiplication#9

Open
Deji10 wants to merge 1 commit into
AA-parallel-computing:mainfrom
Deji10:ayodeji-ibrahim
Open

Ayodeji: Implemented blocked and parallel matrix multiplication#9
Deji10 wants to merge 1 commit into
AA-parallel-computing:mainfrom
Deji10:ayodeji-ibrahim

Conversation

@Deji10

@Deji10 Deji10 commented Jun 1, 2026

Copy link
Copy Markdown

Implementation of blocked and parallel matrix multiplication for Assignment 4.

Implementation

  • naive_matmul: Standard triple-nested loop baseline
  • blocked_matmul: Cache-optimized; default block_size = 64, but block size 16 found to be optimal
  • parallel_matmul: OpenMP-based with #pragma omp parallel for collapse(2)

Methodology

  • All timings averaged over 5 independent runs for stable measurements
  • Three additional modes added: default, blocks (block-size sweep), threads (thread-count sweep)

Key Results

  • All 10 test cases pass correctness validation (output.raw, tolerance 1e-2)
  • Block size experiment: tested 16/32/64/128 → block size 16 gives best speedup (2.33×) on Case 7; the conventional default of 64 was not optimal
  • Thread count experiment: tested 1/2/4/8 → 2 threads optimal (1.52×) on the 2-core Codespaces environment; 8 threads slower than 1 due to scheduling overhead
  • Combined optimal (block 16 + 2 threads) gives ~3× over naive

Full performance tables and analysis available in README.md.

Challenges

  • Local Windows toolchain (no g++ installed); completed in GitHub Codespaces
  • Codespaces 2-core CPU limit caps achievable parallel speedup
  • Provided test cases are small (largest 256 × 300); larger matrices would showcase parallelism better
  • Submitted late due to setup issues

Co-authored with Muhammad Zahid.

Co-authored-by: Muhammad Zahid <muhammad.zahid@example.com>
@Deji10 Deji10 force-pushed the ayodeji-ibrahim branch from 7656d49 to 29aebfc Compare June 1, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant